Code
library(tidyverse)
library(purrr)
library(lubridate)
library(kableExtra)
devtools::load_all("../")library(tidyverse)
library(purrr)
library(lubridate)
library(kableExtra)
devtools::load_all("../")What is betweenness centrality?
The sum of probabilities of passing through a given node on the shortest path between two others.
What it describes in the network:
Betweenness highlights individuals who facilitate direct and indirect interactions between nodes. It captures a quality of bridging two nodes, but not necessarily bridging between two clusters or political factions, in our case.
Why does this fall short when considering our research questions?
Betweenness needs to be modified to capture the moderating behavior between Government and Opposition. It’s also not a direct indication of local bridging behavior, really more of a global view. Standard betweenness doesn’t take weights into account; also assumes shortest paths are the relevant paths.
Formula: \[ C_B(v) = \sum_{s \neq v \neq t}\frac{\sigma_{st}(v)}{\sigma_{st}}\]
Where:
- \(\sigma_{st}\): Total number of shortest paths from node \(s\) to node \(t\).
- \(\sigma_{st}(v)\): Number of shortest paths from node \(s\) to \(t\) that pass through \(v\).
What are we altering?
Instead of considering the shortest path between every pair of nodes, we only use distinct pairs of nodes with opposing affiliation (Government and Opposition).
Here we only evaluate the ability of a node to bridge the gap between the factions.
We ignore connectivity within each faction, since this is not relevant to our research.
Formula: \[ C_B(v) = \sum_{o \neq v \neq g}\frac{\sigma_{og}(v)}{\sigma_{og}}\]
Where:
- \(\sigma_{og}\): Total number of shortest paths from node \(o\), an opposition node, to node \(g\), a government node.
- \(\sigma_{og}(v)\): Number of shortest paths from node \(o\) to \(g\) that pass through \(v\).
Concerns / Potential Pitfalls:
Large, dense clusters can skew the centrality, inflating the scores of nodes because they lie on multiple paths within the same cluster. This could mask who is genuinely “important” as a moderator.
Perhaps indirect interactions are not as important as direct interactions, in which case the global nature of this method doesn’t help us in our research.
Below we’ll share a glimpse at the top 20 highest cross-betweenness results. The results are currently calculated for a single year. This is only because our function needs to be optimized (happening soon).
The highest cross-betweenness results are exhibited by Walerian Pańko in 1982 and Kazimierz Obsadny in 1987.
# loading results of candidate variant #1 "Cross Betweenness"
cb_by_year <- read_csv("cb_initial_results.csv")
cb_by_year |> arrange(desc(CrossBetweenness)) |>
slice_max(order_by = CrossBetweenness, n = 20) |>
kable(format = "html", caption = "Highest 20 Cross-Betweenness Results") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)| Member.ID | CrossBetweenness | Start.Date | End.Date | Full.Name | RT.Affiliation |
|---|---|---|---|---|---|
| MEM0142 | 14161.798 | 1982-01-01 | 1982-12-31 | Walerian Pańko | Opposition |
| MEM0272 | 12024.596 | 1987-01-01 | 1987-12-31 | Kazimierz Obsadny | Government |
| MEM0229 | 5354.350 | 1985-01-01 | 1985-12-31 | Andrzej Ziabicki | Expert |
| MEM0247 | 4597.045 | 1989-01-01 | 1989-12-31 | Alfred Miodowicz | Government |
| MEM0229 | 4252.025 | 1983-01-01 | 1983-12-31 | Andrzej Ziabicki | Expert |
| MEM0229 | 4196.185 | 1984-01-01 | 1984-12-31 | Andrzej Ziabicki | Expert |
| MEM0230 | 4154.471 | 1989-01-01 | 1989-12-31 | Tadeusz Zieliński | Opposition |
| MEM0230 | 4036.975 | 1988-01-01 | 1988-12-31 | Tadeusz Zieliński | Opposition |
| MEM0247 | 4028.795 | 1988-01-01 | 1988-12-31 | Alfred Miodowicz | Government |
| MEM0247 | 3922.914 | 1986-01-01 | 1986-12-31 | Alfred Miodowicz | Government |
| MEM0247 | 3381.670 | 1984-01-01 | 1984-12-31 | Alfred Miodowicz | Government |
| MEM0247 | 3283.586 | 1985-01-01 | 1985-12-31 | Alfred Miodowicz | Government |
| MEM0252 | 3211.582 | 1986-01-01 | 1986-12-31 | Władysław Siła-Nowicki | Government |
| MEM0084 | 3137.443 | 1989-01-01 | 1989-12-31 | Stefan Jurczak | Opposition |
| MEM0252 | 3098.462 | 1987-01-01 | 1987-12-31 | Władysław Siła-Nowicki | Government |
| MEM0084 | 3068.911 | 1986-01-01 | 1986-12-31 | Stefan Jurczak | Opposition |
| MEM0115 | 3043.500 | 1979-01-01 | 1979-12-31 | Wojciech Lamentowicz | Opposition |
| MEM0272 | 2991.712 | 1986-01-01 | 1986-12-31 | Kazimierz Obsadny | Government |
| MEM0115 | 2888.576 | 1978-01-01 | 1978-12-31 | Wojciech Lamentowicz | Opposition |
| MEM0272 | 2785.655 | 1989-01-01 | 1989-12-31 | Kazimierz Obsadny | Government |
When we compare Jacek Kuroń and Lech Wałęsa as Dr. Bodwin suggested, we see their cross-betweenness plots exhibit the predicted inversion.
# Comparing Walesa and Kuron as suggested by Bodwin
cb_by_year |>
filter(Full.Name %in% c("Jacek Kuroń", "Lech Wałęsa")) |>
ggplot(mapping = aes(x = Start.Date,
y = CrossBetweenness,
group = Member.ID,
color = Full.Name)) +
geom_line(linewidth = .75) +
geom_point() +
labs(title = 'Examining Cross-Betweenness in Kuroń and Wałęsa',
x = 'Year',
y = 'Cross-Betweenness',
color = '') +
theme_bw() +
theme(plot.title = element_text(face = "bold", size = 14),
legend.text = element_text(size = 12))Now, we find the individuals with the highest cross-betweenness on average. Plotting their scores, we see consistently high cb from Andrzej Ziabicki and Jan Waleczek, but Stefan Jurczak and Tadeusz Zieliński really spike in the later years.
# I would like to see what individuals have the highest cross-betweenness on average
# filter the data down to include only these
# then visualize
top_5 <- cb_by_year |>
group_by(Member.ID) |>
summarise(avg_cb = mean(CrossBetweenness),
Full.Name = first(Full.Name),
RT.Affiliation = first(RT.Affiliation)) |>
arrange(desc(avg_cb)) |>
slice_max(order_by = avg_cb, n = 5)
top_5 |>
kable(format = "html", caption = "Highest 5 Average Cross-Betweenness Over Time") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)| Member.ID | avg_cb | Full.Name | RT.Affiliation |
|---|---|---|---|
| MEM0272 | 1422.9342 | Kazimierz Obsadny | Government |
| MEM0229 | 1155.6874 | Andrzej Ziabicki | Expert |
| MEM0230 | 1059.0505 | Tadeusz Zieliński | Opposition |
| MEM0084 | 1025.1236 | Stefan Jurczak | Opposition |
| MEM0462 | 923.2615 | Jan Waleczek | Government |
cb_top_5_plot <- cb_by_year |>
semi_join(top_5, by = 'Member.ID')
cb_top_5_plot |>
ggplot(aes(x = Start.Date, y = CrossBetweenness, color = Full.Name)) +
geom_line(linewidth = 0.75) +
# geom_point() +
theme_bw() +
labs(title = "Cross-Betweenness For Highest Average CB Individuals",
x = "Year",
y = "Cross-Betweenness",
color = "") +
theme(plot.title = element_text(face = "bold", size = 18),
legend.text = element_text(size = 12))Here we look at 5 experts with highest average cross-betweenness scores.
top_5_experts <- cb_by_year |>
filter(RT.Affiliation == c("Expert")) |>
group_by(Member.ID) |>
summarise(avg_cb = mean(CrossBetweenness),
Full.Name = first(Full.Name)) |>
arrange(desc(avg_cb)) |>
slice_max(order_by = avg_cb, n = 5)
cb_top_5_expert_plot <- cb_by_year |>
semi_join(top_5_experts, by = 'Member.ID')
top_5_experts |>
kable(format = "html", caption = "Highest Average Cross-Betweenness Experts Over Time") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)| Member.ID | avg_cb | Full.Name |
|---|---|---|
| MEM0229 | 1155.6874 | Andrzej Ziabicki |
| MEM0139 | 820.8965 | Edward Olszewski |
| MEM0293 | 428.7067 | Adam Lipowski |
| MEM0211 | 190.4215 | Jerzy Wertenstein-Żuławski |
| MEM0572 | 177.3812 | Maciej Szumowski |
cb_top_5_expert_plot |>
ggplot(aes(x = Start.Date, y = CrossBetweenness, color = Full.Name)) +
geom_line(linewidth = 0.75) +
# geom_point() +
theme_bw() +
labs(title = "Cross-Betweenness For Highest Average CB Experts",
x = "Year",
y = "Cross-Betweenness",
color = "") +
theme(plot.title = element_text(face = "bold", size = 18),
legend.text = element_text(size = 12))# Comparing Pańko and Obsadny
cb_by_year |>
filter(Full.Name %in% c("Walerian Pańko", "Kazimierz Obsadny")) |>
ggplot(mapping = aes(x = Start.Date,
y = CrossBetweenness,
group = Member.ID,
color = Full.Name)) +
geom_line(linewidth = .75) +
geom_point() +
labs(title = 'Examining Cross-Betweenness in Pańko and Obsadny',
x = 'Year',
y = 'Cross-Betweenness',
color = '') +
theme_bw() +
theme(plot.title = element_text(face = "bold", size = 14),
legend.text = element_text(size = 12))These high spikes definitely pique our interest. Pańko’s spike in 1982 corresponds with when he left PZPR. We took a look at the network app around this time and saw behavior in the graph that seems to validate our metric calculation.
Dr. Domber also suggested taking a look at two individuals that may be good examples of potential moderation Wladyslaw Sila-Nowick and Wojciech Lamentowicz.
# Comparing Wladyslaw Sila-Nowicki and Wojciech Lamentowicz
cb_by_year |>
filter(Member.ID %in% c("MEM0252", "MEM0115")) |>
ggplot(mapping = aes(x = Start.Date,
y = CrossBetweenness,
group = Member.ID,
color = Full.Name)) +
geom_line(linewidth = .75) +
geom_point() +
labs(title = 'Examining Cross-Betweenness in Sila-Nowicki and Lamentowicz',
x = 'Year',
y = 'Cross-Betweenness',
color = '') +
theme_bw() +
theme(plot.title = element_text(face = "bold", size = 14),
legend.text = element_text(size = 12))What are we altering?
We still consider the shortest paths between pairs of nodes of opposing factions, but to combat the score inflation from large organizations, we introduce a normalizing factor.
We divide by the product of cluster sizes of each of the target nodes.
Formula: \[ C_B(v) = \sum_{o \neq v \neq g}\frac{\sigma_{og}(v)}{\sigma_{og}} \cdot \frac{1}{|C_o| \cdot |C_g|}\]
Where:
- \(\sigma_{og}\): Total number of shortest paths from node \(o\), an opposition node, to node \(g\), a government node.
- \(\sigma_{og}(v)\): Number of shortest paths from node \(o\) to \(g\) that pass through \(v\).
- \(|C_o|\): Size of the cluster containing node \(o\).
- \(|C_g|\): Size of the cluster containing node \(g\).
Concerns / Potential Pitfalls:
We are still considering indirect interactions here, which may or may not be appropriate.
While mitigating against large clusters skewing the results, we may be giving undue influence to smaller clusters. Perhaps adjusting the normalization by some factor could help.
variant that adjusts the normalization factor
variant that explores a decay factor to limit how much indirect interactions contribute.
variant that considers only direct paths.
variants of other standard metrics - like eigen centrality
exploring ratio idea